-
Notifications
You must be signed in to change notification settings - Fork 909
Custody backfill sync #7907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Custody backfill sync #7907
Conversation
None | ||
} | ||
} | ||
None => None, // If no DA boundary set, dont try to custody backfill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
@dapplion would you mind reviewing this please 🙏 |
While this approach is correct, we can have a simpler approach without blindly copying the machinery of the current backfill sync. Backfill sync is conceptually simpler than forwards sync and we know exactly what data we are expecting. The only annoyance is us requesting a range of slots that happen to have zero blocks. In that case, we need the "PendingValidation" step to handle that. However, we can make Custody backfill sync lag behind the regular backfill sync and use the roots of downloaded blocks to know exactly what data we are expecting. Then the sync process is deterministic:
That's it, no need to track participating_peers or have pending validation batches. |
SyncServiceMessage::CustodyCountChanged { columns } => { | ||
// Wait for the current epoch to finalize before starting custody sync | ||
if let Err(e) = self.custody_sync.wait_for_finalization(columns) { | ||
tracing::warn!(error = ?e, "Failed to set custody backfill state to awaiting finalization"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing::warn!(error = ?e, "Failed to set custody backfill state to awaiting finalization"); | |
warn!(error = ?e, "Failed to set custody backfill state to awaiting finalization"); |
@@ -293,8 +301,12 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> { | |||
blocks_by_range_requests: ActiveRequests::new("blocks_by_range"), | |||
blobs_by_range_requests: ActiveRequests::new("blobs_by_range"), | |||
data_columns_by_range_requests: ActiveRequests::new("data_columns_by_range"), | |||
custody_sync_data_columns_by_range_requests: ActiveRequests::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to introduce a new set of requests, just use data_columns_by_range_requests
. And you change DataColumnsByRangeRequestId::parent_request_id
into an enum that can either be ComponentsByRangeRequestId
or CustodySyncBatchRequestId
@@ -15,6 +15,10 @@ pub enum SyncState { | |||
/// specified by its peers. Once completed, the node enters this sync state and attempts to | |||
/// download all required historical blocks. | |||
BackFillSyncing { completed: usize, remaining: usize }, | |||
/// The node is undertaking a custody backfill sync. This occurs for a node that has completed forward and | |||
/// backfill sync and has undergone a custody count change. During custody backfill sync the node attempts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has completed forward and backfill sync
Is it true? Or it just
The sync is waiting for the
earliest_available_data_column_slot
s epoch to finalize before starting
The comments appear inconsistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they both need to be true
range sync and backfill sync must finish, and earliest_available_data_column_slots
epoch must be finalized before custody backfill sync can start
/// A custody backfill sync has completed. | ||
Completed, | ||
/// Too many failed attempts at backfilling. Consider it failed. | ||
Failed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my suggestion above, backfill sync can only fail when our current peers send us invalid data too many times. We can resume syncing once a new peer joins. So is the Failed
state necessary on its own?
Failed, | ||
/// A custody sync should is set to Pending if the node is undergoing range/backfill syncing. | ||
/// It should resume syncing after the node is fully synced. | ||
Pending, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can collapse AwaitingFinalization
Paused
Failed
and Pending
into a single state called Pending(reason: String)
or Paused(reason: String)
. And then external events like new peers or finalization call the resume function which checks that we have finalized, have peers, etc and then resume.
|
||
#[derive(Debug)] | ||
/// A segment of a chain. | ||
pub struct CustodyBatchInfo<E: EthSpec> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can generalize range_sync::batch::BatchInfo
by making it generic over the data it holds on BatchState::AwaitingProcessing
and re-use for custody backfill. A batch basically does:
- Track that a download for X is ongoing
- Handle download retries and track failed peers
- Track that a processing for X is ongoing
- Handle processing retries and track failed peers
Which is the same for range sync batch and custody batch. Do you really need to track CustodyBatchInfo::columns
inside of the batch? Given an epoch you should be able to compute the columns in the network_context. Also if CUSTODY_BACKFILL_EPOCHS_PER_BATCH
we don't need to track start_slot and end_slot. This is legacy code from before Deneb where each batch had more than one epoch in itself.
Custody backfill sync refactor
BatchInfo refactor
Issue Addressed
#7603
Proposed Changes
Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.
SyncNeworkContext
SyncNetworkContext
manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns
New channel to send messages to
SyncManager
Now external services can communicate with the
SyncManager
. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existingmpsc
channel that theSyncNetworkContext
uses to communicate with theSyncManager
. I will spend some time reviewing this.TODOs
ValidatorRegistrations.epoch_validator_custody_requirements
while custody syncing, or once its completedspeedo
logicAddtional notes
This needs to be throughouly tested before being included in
8.0.0-rc.0
.